Learning linguistically valid pronunciations from acoustic data

نویسندگان

  • Françoise Beaufays
  • Ananth Sankar
  • Shaun Williams
  • Mitch Weintraub
چکیده

We describe an algorithm to learn word pronunciations from acoustic data. The algorithm jointly optimizes the pronunciation of a word using (a) the acoustic match of this pronunciation to the observed data, and (b) how “linguistically reasonable” the pronunciation is. Variations of word pronunciations in the recognition dictionary (which was created by linguists), are used to train a model of whether new hypothesized pronunciations are reasonable or not. The algorithm is well-suited for proper name pronunciation learning. Experiments on a corporate name dialing database show 40% error rate reduction with respect to a letter-to-phone pronunciation engine.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Linguistically Valid Pronun

We describe an algorithm to learn word pronunciations from acoustic data. The algorithm jointly optimizes the pronunciation of a word using (a) the acoustic match of this pronunciation to the observed data, and (b) how “linguistically reasonable” the pronunciation is. Variations of word pronunciations in the recognition dictionary (which was created by linguists), are used to train a model of w...

متن کامل

Pronunciation Learning with RNN-Transducers

Most speech recognition systems rely on pronunciation dictionaries to provide accurate transcriptions. Typically, some pronunciations are carved manually, but many are produced using pronunciation learning algorithms. Successful algorithms must have the ability to generate rich pronunciation variants, e.g. to accommodate words of foreign origin, while being robust to artifacts of the training d...

متن کامل

An investigation of acoustic models for multilingual code-switching

Multilingual speech processing continues to develop as speech technology spreads to heterogeneous clients and applications. We address a distinct problem of code-switching — the spontaneous but occasional use, within speech in one language (referred to as L1), of words, phrases, expressions or idioms from a second language (L2). We examine two alternatives for modeling the acoustics of such wor...

متن کامل

Unsupervised Joint Estimation of Grapheme-to-Phoneme Conversion Systems and Acoustic Model Adaptation for Non-Native Speech Recognition

Non-native speech differs significantly from native speech, often resulting in a degradation of the performance of automatic speech recognition (ASR). Hand-crafted pronunciation lexicons used in standard ASR systems generally fail to cover non-native pronunciations, and design of new ones by linguistic experts is time consuming and costly. In this work, we propose acoustic data-driven iterative...

متن کامل

Learning from Mistakes: Expanding Pronunciation Lexicons Using Word Recognition Errors

We introduce the problem of learning pronunciations of out-ofvocabulary words from word recognition mistakes made by an automatic speech recognition (ASR) system. This question is especially relevant in cases where the ASR engine is a black box – meaning that the only acoustic cues about the speech data come from the word recognition outputs. This paper presents an expectation maximization appr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003